Effect of Hunting on Red Deer

P15.2 Fortgeschrittenes Praxisprojekt

Nikolai German, Thomas Witzani, Ziqi Xu, Zhengchen Yuan, Baisu Zhou

Dr. Nicolas Ferry - Bavarian National Forest Park / Daniel Schlichting - StabLab

31 Jan 2025

Agenda

  1. The Background
  1. The Data
  1. The Models
  1. The Wrap-up

Motivation

  • Hunting activities have a numerical effect on animal populations
  • Additionally, hunting can have non-lethal effects
  • Goal: assess short-term stress response in red deer towards hunting events at the Bavarian Forest National Park

Data-Generating Process

  • A deer roams freely in the Bavarian Forest National Park
  • Its movement is tracked by a GPS collar
  • A hunting event happens
  • After some time, the deer defecates. The defecation event
  • Subsequently, Researchers go to the defecation location and collect a faecal sample

FCMs as a Measure of Stress

  • Faecal Cortisol Metabolites (FCM) are substances found in feces of animals
  • The FCM level is used to measure previous stress. Higher Stress \(\Rightarrow\) Higher FCM level
  • Stress \(\Rightarrow\) secretion of certain hormones \(\Rightarrow\) gut retention \(\Rightarrow\) FCM
  • Gut retention time \(\approx\) 19 hours
  • Once defecated, FCM levels decay over time

Huber et al (2003)

Research Questions

  • What is the effect of temporal and spatial distance on FCM levels?
  • Does the time between defecation event and sample collection effect FCM levels?

Approach

  • Model FCM levels - amongst other covariates - on spatial and temporal distance to hunting activities

  • Expectations:

    • FCM levels higher when closer in time and space
    • FCM levels lower, the more time passes between defecating and sampling

Agenda

  1. The Background
  1. The Data
  1. The Models
  1. The Wrap-up

The Datasets

  • FCM Data
  • Hunting Events
  • Movement Data

FCM Data

Contains information of 809 faecal samples, including:

  • the FCM level [ng/g],
  • the time and location of sampling,
  • to which deer the sample belongs,
  • when the defecation happened.

Samples where taken at irregular time intervals from 2020 to 2022.

Hunting Events

  • Contains location and time of \(\geq\) 700 hunting events from 2020 to 2022.
  • 519 hunting events have complete location and time information.

Movement Data

  • Contains the location of the 40 collared deer from Feb. 2020 to Feb. 2023.
  • Movement is tracked at hourly intervals.

Limited Data, Large Uncertainty

  • Hunting events are single points in time and space.
  • Deer locations at hourly intervals \(\Rightarrow\) exact distances unknown \(\Rightarrow\) approximate needed, large uncertainty!
  • Each deer only encountered few hunting events.

Limited Data, Large Uncertainty

Limited Data, Large Uncertainty

Other sources of uncertainty include:

  • lack of information about hunting events (single time points as start, end, middle?)

  • unknown characteristics of the deer (e.g., age, health, etc.),

  • other unknown stressors (e.g., predators, human activities, weather, etc.),

  • unknown geographical features (e.g., terrain could affect the propagation of sound).

Distance Approximation

Deer location at the time of hunting event is approximated by linear interpolation:

Relevant Hunting Events

A hunting event is considered relevant to a faecal sample, if

  • the time difference between hunting and defecation is between the gut retention time (GRT) thresholds, and
  • the distance between the deer and the hunting event is \(\leq\) distance threshold.

In this presentation:

  • GRT thresholds = (0, 36) hours,
  • distance threshold = 10 hours.

Time difference Distance 19 hours distance threshold GRT highthreshold GRT lowthreshold Deer Hunting events

The Most Relevant Hunting Event

Among the relevant hunting events, the most relevant one is defined by one the three proximity criteria:

  • closest in time (to the GRT target of 19 hours),
  • nearest (smallest spatial distance),
  • highest score.

Time difference Distance 19 hours distance threshold GRT highthreshold GRT lowthreshold Deer Hunting events Nearest Highestscore Closest in time(to 19 hours)

The Scoring Function

we define the Scoring function as following:

\[ S(d, t) \propto \begin{cases} \frac{1}{d^2} \cdot f_\textbf{t}(t), t \sim \mathcal{N}(\mu, \sigma^2) &|t \leq \mu \\ \frac{1}{d^2} \cdot f_\textbf{t}(t), t \sim \mathcal{Laplace}(\mu, b) &|t > \mu \end{cases} \] where:

\[ \begin{align*} d & \text{: Distance } \\ t & \text{: Time Difference } \\ \mu & \text{: GRT target = 19 hours } \end{align*} \]

The Scoring Function

The marginal effects of distance and elapsed time since challenge on the score:

The Fused Data

We report models fitted on the following datasets:

DataSet Proximity Criterion Deer Observations
1 closest in time 35 149
2 nearest 35 147
3 score 36 223

Agenda

  1. The Background
  1. The Data
  1. The Models
  1. The Wrap-up

The Models

For Modelling, we consider the following covariates, defined for each pair of FCM sample and most relevant hunting event:

  • Time difference: time of defecation - time of hunting event [hours]
  • Distance: distance between deer and hunting event [km]
  • Sample delay: time of sample collection - time of defecation [hours]
  • Defecation day (day of year as integer)
  • Number of other relevant hunting events

The Models

We chose two different approaches to Modelling:

  1. Statistical Modelling: a model, which helps to understand the effects of our covariates, here a General Additive Mixed Model
  2. Machine Learning: a model, which focuses on prediction, in our case a XGBoost Model

A. Generalized Additive Mixed Model

  • Family: Gamma

  • Let \(i = 1,\dots,N\) be the indices of deer and \(j = 1,\dots,n_i\) be the indices of faecal samples for each deer

    \[ \begin{eqnarray} \textup{FCM}_{ij} &\overset{\mathrm{iid}}{\sim}& \mathcal{Ga}\left( \nu, \frac{\nu}{\mu_{ij}} \right) \quad\text{for}\; j = 1,\dots,n_i, \\ \mu_{ij} &=& \mathbb{E}(\textup{FCM}_{ij}) = \exp(\eta_{ij}), \\ \eta_{ij} &=& \beta_0 + \beta_1 \cdot \textup{number of other relevant hunting events}_{ij} + \\ && f_1(\textup{time difference}_{ij}) + f_2(\textup{distance}_{ij}) + \\ && f_3(\textup{sample delay}_{ij}) + f_4(\textup{defecation day}_{ij}) + \\ && \gamma_{i}, \\ \gamma_i &\overset{\mathrm{iid}}{\sim}& \mathcal{N}(0, \sigma_\gamma^2) \end{eqnarray} \]

    \(f_1, f_2, f_3, f_4\) are penalized cubic regression splines.

A. Generalized Additive Mixed Model

Main Results

  • High uncertainty (large standard error) about estimated effects, in particular of time difference and distance, across all datasets.

  • Consistent pattern of sample delay effect when using REML method: larger sample delay \(\Rightarrow\) lower FCM level, as expected.

  • Instability with respect to estimation methods. GCV tends to yield more wiggly smooth effects than REML.

  • Estimation of random intercepts is sensitive to choice of dataset.

B. XGBoost

XGBoost is a gradient boosting algorithm that builds decision trees sequentially, each one correcting the errors of the previous. It improves accuracy with techniques like regularization, shrinkage, and column subsampling, making it efficient and better at generalization.

It works very well for numerical data and is well implemented which is why we chose it.

Model Mean RMSE SD RMSE Number of Observations
last 168.6336 24.40957 149
nearest 151.3186 17.91780 147
score 147.9845 16.50250 223

Agenda

  1. The Background
  1. The Data
  1. The Models
  1. The Wrap-up

Conclusion

  • Due to the high uncertainties, we were not able to detect a relevant effect of spatial or temporal distance on FCM levels.
  • We have observed the expected decay of FCM levels with prolonged time between defecation event and sample collection.

Appendix

GAMM Parametric Effects

Number of Other Relevant Hunting Events

Method Dataset Estimate exp(Estimate) Standard error
REML Closest in Time -0.0917716 0.9123135 0.0631812
REML Nearest -0.0742422 0.9284468 0.0614678
REML Highest Score -0.0127894 0.9872920 0.0137630
GCV Closest in Time -0.1370438 0.8719320 0.0614158
GCV Nearest -0.1026115 0.9024775 0.0596574
GCV Highest Score -0.0158962 0.9842295 0.0140837

GAMM Adjusted Predictions

REML / Closest in Time

GAMM Adjusted Predictions

REML / Nearest

GAMM Adjusted Predictions

REML / Highest Score

GAMM Adjusted Predictions

GCV / Closest in Time

GAMM Adjusted Predictions

GCV / Nearest

GAMM Adjusted Predictions

GCV / Highest Score

GAMM Random Intercepts

REML

GAMM Random Intercepts

GCV

XGBoost Predicted vs. Actual Plots

Closest in Time

XGBoost Predicted vs. Actual Plots

Nearest

XGBoost Predicted vs. Actual Plots

Highest Score

XGBoost Workflow

  1. Split Data: We divide each dataset into training and testing sets. (75% - 25%)
  2. Set Hyperparameter Grid: We define a range of values for starting hyperparameters.
  3. Optimize Hyperparameters: We then perform a grid search to evaluate different hyperparameter combinations -> Iteratively readjust the grid based on test RMSE until convergence. Uses CV
  4. Train Final Model on Full Data: We then use the optimized parameters to train on the entire dataset. We prevent overfitting by optimal n_rounds to keep Test-RMSE low.
  5. Aggregate Results: Run the pipeline 40 times with different seeds. Use the average RMSE and average predictions to evaluate the overall performance.

We do this seperately for all 3 datasets (nearest, closest and score).

Movement and Hunting Events

Faecal Samples and Hunting Events Over Time